Data Interpolation: An Efficient Sampling Alternative for Big Data Aggregation

نویسندگان

  • Hadassa Daltrophe
  • Shlomi Dolev
  • Zvi Lotker
چکیده

Given a large set of measurement sensor data, in order to identify a simple function that captures the essence of the data gathered by the sensors, we suggest representing the data by (spatial) functions, in particular by polynomials. Given a (sampled) set of values, we interpolate the datapoints to define a polynomial that would represent the data. The interpolation is challenging, since in practice the data can be noisy and even Byzantine, where the Byzantine data represents an adversarial value that is not limited to being close to the correct measured data. We present two solutions, one that extends the Welch-Berlekamp technique in the case of multidimensional data, and copes with discrete noise and Byzantine data, and the other based on Arora and Khot techniques, extending them in the case of multidimensional noisy and Byzantine data. ∗Department of Computer Science, Ben-Gurion University, Beer-Sheva, 84105, Israel. Email: [email protected]. Partially supported by a Russian Israeli grant from the Israeli Ministry of Science and Technology and the Russian Foundation for Basic Research, the Rita Altura Trust Chair in Computer Sciences, the Lynne and William Frankel Center for Computer Sciences, Israel Science Foundation (grant number 428/11), Cabarnit Cyber Security MAGNET Consortium, Grant from the Institute for Future Defense Technologies Research named for the Medvedi of the Technion, MAFAT, and Israeli Internet Association †Department of Computer Science, Ben-Gurion University, Beer-Sheva, 84105, Israel. Email:[email protected]. Partially supported by Deutsche Telekom, Rita Altura Trust Chair in Computer Sciences, Lynne and William Frankel Center for Computer Sciences, Israel Science Foundation (grant number 428/11) and Cabarnit Cyber Security MAGNET Consortium. ‡Department of Communication Systems Engineering, Ben-Gurion University, Beer-Sheva, 84105, Israel. Email: [email protected].

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Keynote Talk I Geometrical Approach to Big Data

The Big Data paradigm is one of the main science and technology challenges of today. Big data includes various data sets that are too large or too complex for efficient processing and analysis using traditional as well as unconventional algorithms and tools. The challenge is to derive value from signals buried in an avalanche of noise arising from challenging data volume, flow and validity. The...

متن کامل

Geometrical Approach to Big Data

The Big Data paradigm is one of the main science and technology challenges of today. Big data includes various data sets that are too large or too complex for efficient processing and analysis using traditional as well as unconventional algorithms and tools. The challenge is to derive value from signals buried in an avalanche of noise arising from challenging data volume, flow and validity. The...

متن کامل

Optimizing Window Aggregate Functions via Random Sampling

Window functions have been a part of the SQL standard since 2003 and have been well studied during the past decade. As the demand increases in analytics tools, window functions have seen an increasing amount of potential applications. Although the current mainstream commercial databases support window functions, the existing implementation strategies are inefficient for the real-time processing...

متن کامل

LOOM: Optimal Aggregation Overlays for In-Memory Big Data Processing

Aggregation underlies the distillation of information from big data. Many well-known basic operations including top-k matching and word count hinge on fast aggregation across large data-sets. Common frameworks including MapReduce support aggregation, but do not explicitly consider or optimize it. Optimizing aggregation however becomes yet more relevant in recent “online” approaches to expressiv...

متن کامل

Error-bounded Sampling for Analytics on Big Sparse Data

Aggregation queries are at the core of business intelligence and data analytics. In the big data era, many scalable sharednothing systems have been developed to process aggregation queries over massive amount of data. Microsoft’s SCOPE is a well-known instance in this category. Nevertheless, aggregation queries are still expensive, because query processing needs to consume the entire data set, ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1210.3171  شماره 

صفحات  -

تاریخ انتشار 2012